NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Systematically Identifying, Defining and Organizing Knowledge Components for Data Science Problem Solving through Human-LLM Collaboration

https://doi.org/10.1145/3698205.3733952

Priyanka_Rani, FNU; Alomair, Maryam; Pan, Shimei; Chen, Lujie K (July 2025, ACM)

As demand grows for job-ready data science professionals, there is increasing recognition that traditional training often falls short in cultivating the higher-order reasoning and real-world problem-solving skills essential to the field. A foundational step toward addressing this gap is the identification and organization of knowledge components (KCs) that underlie data science problem solving (DSPS). KCs represent conditional knowledge—knowing about appropriate actions given particular contexts or conditions—and correspond to the critical decisions data scientists must make throughout the problem-solving process. While existing taxonomies in data science education support curriculum development, they often lack the granularity and focus needed to support the assessment and development of DSPS skills. In this paper, we present a novel framework that combines the strengths of large language models (LLMs) and human expertise to identify, define, and organize KCs specific to DSPS. We treat LLMs as ``knowledge engineering assistants" capable of generating candidate KCs by drawing on their extensive training data, which includes a vast amount of domain knowledge and diverse sets of real-world DSPS cases. Our process involves prompting multiple LLMs to generate decision points, synthesizing and refining KC definitions across models, and using sentence-embedding models to infer the underlying structure of the resulting taxonomy. Human experts then review and iteratively refine the taxonomy to ensure validity. This human-AI collaborative workflow offers a scalable and efficient proof-of-concept for LLM-assisted knowledge engineering. The resulting KC taxonomy lays the groundwork for developing fine-grained assessment tools and adaptive learning systems that support deliberate practice in DSPS. Furthermore, the framework illustrates the potential of LLMs not just as content generators but as partners in structuring domain knowledge to inform instructional design. Future work will involve extending the framework by generating a directed graph of KCs based on their input-output dependencies and validating the taxonomy through expert consensus and learner studies. This approach contributes to both the practical advancement of DSPS coaching in data science education and the broader methodological toolkit for AI-supported knowledge engineering.
more » « less
Free, publicly-accessible full text available July 17, 2026
fairGNN-WOD: Fair Graph Learning Without Complete Demographics

https://doi.org/10.24963/ijcai.2025/63

Wang, Zichong; Liu, Fang; Pan, Shimei; Liu, Jun; Saeed, Fahad; Qiu, Meikang; Zhang, Wenbin (September 2025, International Joint Conferences on Artificial Intelligence Organization)

Graph Neural Networks (GNNs) have excelled in diverse applications due to their outstanding predictive performance, yet they often overlook fairness considerations, prompting numerous recent efforts to address this societal concern. However, most fair GNNs assume complete demographics by design, which is impractical in most real-world socially sensitive applications due to privacy, legal, or regulatory restrictions. For example, the Consumer Financial Protection Bureau (CFPB) mandates that creditors ensure fairness without requesting or collecting information about an applicant’s race, religion, nationality, sex, or other demographics. To this end, this paper proposes fairGNN-WOD, a first-of-its-kind framework that considers mitigating unfairness in graph learning without using demographic information. In addition, this paper provides a theoretical perspective on analyzing bias in node representations and establishes the relationship between utility and fairness objectives. Experiments on three real-world graph datasets illustrate that fairGNN-WOD outperforms state-of-the-art baselines in achieving fairness but also maintains comparable prediction performance.
more » « less
Free, publicly-accessible full text available September 1, 2026
Fair Inference for Discrete Latent Variable Models: An Intersectional Approach

https://doi.org/10.1145/3677525.3678660

Islam, Rashidul; Pan, Shimei; Foulds, James R (September 2024, ACM)

Full Text Available
Towards A Unifying Human-Centered AI Fairness Framework

https://doi.org/10.1145/3677525.3678645

Rahman, Munshi Mahbubur; Pan, Shimei; Foulds, James R (September 2024, ACM)

Full Text Available
When Biased Humans Meet Debiased AI: A Case Study in College Major Recommendation

https://doi.org/10.1145/3611313

Wang, Clarice; Wang, Kathryn; Bian, Andrew Y.; Islam, Rashidul; Keya, Kamrun Naher; Foulds, James; Pan, Shimei (September 2023, ACM Transactions on Interactive Intelligent Systems)

Currently, there is a surge of interest in fair Artificial Intelligence (AI) and Machine Learning (ML) research which aims to mitigate discriminatory bias in AI algorithms, e.g., along lines of gender, age, and race. While most research in this domain focuses on developing fair AI algorithms, in this work, we examine the challenges which arise when humans and fair AI interact. Our results show that due to an apparent conflict between human preferences and fairness, a fair AI algorithm on its own may be insufficient to achieve its intended results in the real world. Using college major recommendation as a case study, we build a fair AI recommender by employing gender debiasing machine learning techniques. Our offline evaluation showed that the debiased recommender makes fairer career recommendations without sacrificing its accuracy in prediction. Nevertheless, an online user study of more than 200 college students revealed that participants on average prefer the original biased system over the debiased system. Specifically, we found that perceived gender disparity is a determining factor for the acceptance of a recommendation. In other words, we cannot fully address the gender bias issue in AI recommendations without addressing the gender bias in humans. We conducted a follow-up survey to gain additional insights into the effectiveness of various design options that can help participants to overcome their own biases. Our results suggest that making fair AI explainable is crucial for increasing its adoption in the real world.
more » « less
Full Text Available
Differential Fairness: An Intersectional Framework for Fair AI

https://doi.org/10.3390/e25040660

Islam, Rashidul; Keya, Kamrun Naher; Pan, Shimei; Sarwate, Anand D.; Foulds, James R. (April 2023, Entropy)

We propose definitions of fairness in machine learning and artificial intelligence systems that are informed by the framework of intersectionality, a critical lens from the legal, social science, and humanities literature which analyzes how interlocking systems of power and oppression affect individuals along overlapping dimensions including gender, race, sexual orientation, class, and disability. We show that our criteria behave sensibly for any subset of the set of protected attributes, and we prove economic, privacy, and generalization guarantees. Our theoretical results show that our criteria meaningfully operationalize AI fairness in terms of real-world harms, making the measurements interpretable in a manner analogous to differential privacy. We provide a simple learning algorithm using deterministic gradient methods, which respects our intersectional fairness criteria. The measurement of fairness becomes statistically challenging in the minibatch setting due to data sparsity, which increases rapidly in the number of protected attributes and in the values per protected attribute. To address this, we further develop a practical learning algorithm using stochastic gradient methods which incorporates stochastic estimation of the intersectional fairness criteria on minibatches to scale up to big data. Case studies on census data, the COMPAS criminal recidivism dataset, the HHP hospitalization data, and a loan application dataset from HMDA demonstrate the utility of our methods.
more » « less
Full Text Available
Do Humans Prefer Debiased AI Algorithms? A Case Study in Career Recommendation

https://doi.org/10.1145/3490099.3511108

Wang, Clarice; Wang, Kathryn; Bian, Andrew; Islam, Rashidul; Keya, Kamrun Naher; Foulds, James; Pan, Shimei (January 2022, Annual Conference on Intelligent User Interfaces (IUI))

Full Text Available
Can We Obtain Fairness For Free?

https://doi.org/10.1145/3461702.3462614

Islam, Rashidul; Pan, Shimei; Foulds, James R. (May 2021, AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society 2021)
null (Ed.)
There is growing awareness that AI and machine learning systems can in some cases learn to behave in unfair and discriminatory ways with harmful consequences. However, despite an enormous amount of research, techniques for ensuring AI fairness have yet to see widespread deployment in real systems. One of the main barriers is the conventional wisdom that fairness brings a cost in predictive performance metrics such as accuracy which could affect an organization's bottom-line. In this paper we take a closer look at this concern. Clearly fairness/performance trade-offs exist, but are they inevitable? In contrast to the conventional wisdom, we find that it is frequently possible, indeed straightforward, to improve on a trained model's fairness without sacrificing predictive performance. We systematically study the behavior of fair learning algorithms on a range of benchmark datasets, showing that it is possible to improve fairness to some degree with no loss (or even an improvement) in predictive performance via a sensible hyper-parameter selection strategy. Our results reveal a pathway toward increasing the deployment of fair AI methods, with potentially substantial positive real-world impacts.
more » « less
Full Text Available
Debiasing Career Recommendations with Neural Fair Collaborative Filtering

https://doi.org/10.1145/3442381.3449904

Islam, Rashidul; Keya, Kamrun Naher; Zeng, Ziqian; Pan, Shimei; Foulds, James (April 2021, Proceedings of The Web Conference (WWW 2021))
null (Ed.)
Full Text Available
Fair Representation Learning for Heterogeneous Information Networks

Zeng, Ziqian; Islam, Rashidul; Keya, Kamrun Naher; Foulds, James; Song, Yangqiu; Pan, Shimei (May 2021, Proceedings of the International AAAI Conference on Weblogs and Social Media)

Full Text Available

« Prev Next »

Search for: All records